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DETAILED ACTION 
NOTE: This action has not been made final due merely to recent court decisions 
regarding claims not being tied to a statutory class or transforming subject matter. See 
35 U.S.C. 101 rejection below. 

Response to Arguments 

1 . Applicant's arguments filed 02/03/2009 have been fully considered but they are 
not persuasive. 

Argument (page 12 paragraph 2, page 14 paragraph 2, and page 17 
paragraph 2): 

• "The Applicants submit that there is no discussion in the cited paragraph 
of creating a gender-independent speech recognition model based on a 
male and female set of recorded phonemes training data if the difference 
in model information is insignificant" 
Response to argument: 

Examiner takes the position that Chang in fact appears to explicitly teach the aid 
of gender based modeling in order to generate gender-independent modeling, 
wherein the use of a gender independence within the specification of the present 
invention (spec, page 4) is parallel to that of silence, wherein Examiner construes 
any sound other than gender to be functionally equivalent and equally effective to 
gender independence, as is well known in the art and explicitly taught by Chang. 
Chang teaches the superior use of gender dependent models to aid and improve 
independent models, wherein Chang teaches having discriminative training on 
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the gender dependent mode l (male speaker cluster 102 and female speaker 
cluster 1 04 ) in. the second level of the tree. Because speakers of different 
gender clusters have very different characteristics, we will not adjust parameters 
across different gender clusters. That means that the discriminative training 
performed on the parameters of the male speaker cluster 102 only uses speech 
data uttered by male speakers . The discriminative training performed on the 
parameters of the female speaker cluster 104 only uses speech data uttered by 
female speakers . It is shown in Table 1 that the recognition result using the 
gender-dependent model is superior to that using the speaker-independent 
model. Because the gender-dependent model is a simple plain-structured 
speaker cluster model , the speaker cluster model can readily manage recognition 
problems caused by differences between speaker characteristics, improving the 
recognition result of speaker-independent speech recognition (Col. 6 lines 5- 
27). This clearly demonstrates gender independence developed from gender 
dependence (i.e. "based on"). 

As far as insignificance is concerned, Examiner has incorporated Yang to further 
strengthen the teachings of Chang. Though Chang implicitly teaches 
comparison characteristics in models, Chang does not suggest the concept of 
insignificant difference, and rather teaches obvious differences from a broad 
sense, whereas Yang teaches insignificant or small comparison/differences 
explicitly, which is consistent with the present invention teaching large and small 
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distances/differences (spec, page 7 small and large distances). Yang teaches 
well known speech recognition methods where large difference from speaker to 
speaker whereas the latter one shows small difference . So if the difference 
based on the shape of the vocal tract is somehow normalized, the speech of 
specified speakers can be recognized using only the utterances of a small 
number of speakers. The difference in the shape of the vocal tracts causes 
different frequency spectra. One of the methods to normalize the spectral 
difference among speakers is to classify voice input by matching it with phoneme 
templates which are made for unspecified speakers . This operation provides 
similarity, which does not depend very much on the differences among speakers . 
Meanwhile, the temporal pattern of vocal tract is considered to have small 
individual difference ([Yang [0004]). 

Yang enables the concept of identifying both small and large speech recognition 
differences in speech recognition and gives an example of gender based 
language recognition ([0064]). Thus is it obvious to combine the teachings of 
Chang and Yang, as both teachings are within the scope of the claims and 
together explicitly demonstrate the results of the claim language and their 
limitations. 
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Claim Rejections - 35 USC § 101 
2. 35 U.S.C. 101 reads as follows: 

Whoever invents or discovers any new and useful process, machine, manufacture, or composition of 
matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the 
conditions and requirements of this title. 

Claims 1-5 and 17-20 are rejected under 35 U.S.C. 101 because: 
Claims 1-5 and 17-20 do not fall within one of the four statutory categories of 
invention. Supreme Court precedent 1 and recent Federal Circuit decisions 2 indicate 
that a statutory "process" under 35 U.S.C. 101 must (1) be tied to another statutory 
category (such as a particular apparatus), or (2) transform underlying subject matter 
(such as an article or material) to a different state or thing. While the instant claim(s) 
recite a series of steps or acts to be performed, the claim(s) neither transform 
underlying subject matter nor positively tie to another statutory category that 
accomplishes the claimed method steps, and therefore do not qualify as a statutory 
process. 

Claims 1 and 17 recite purely mental steps and would not qualify as a statutory 
process. In order to qualify as a statutory process, the method claim should positively 
recite the other statutory class to which it is tied (i.e. apparatus, device, product, etc.). 
For example, the method steps of claim 1 appear to recite mental steps such as 
"generating speech recognition models" and do not identify an apparatus that performs 



1 Diamond v. Diehr, 450 U.S. 175, 184 (1981); Parker v. Flook, 437 U.S. 584, 588 n.9 (1978); Gottschalk 
v. Benson, 409 U.S. 63, 70 (1972); Cochrane v. Deener, 94 U.S. 780, 787-88 (1876). 

2 In re Bilski, 88 USPQ2d 1385 (Fed. Cir. 2008). 
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the recited method steps, such as computer executed steps as described in the 
specification (present invention page 5 and fig. 1). 



Claim Rejections - 35 USC § 103 

3. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

4. Claims 1, 5, 6, 10, 11, 15, and16 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Chang et al. US 6567776 B1 (hereinafter Chang) in view of Yang US 
20010010039 A1 (hereinafter Yang). 

Re claims 1,6, 11, and 16, Chang teaches a method for generating speech 
recognition models, the method comprising: 

converting speech spoken from a plurality of female speakers (Col. 1 lines 15-49) 
into a first set of recorded phonemes training data (Col. 5 line 45 - Col. 6 line 67); 

converting speech spoken from a plurality of male speakers (Col. 1 lines 15-49) 
into a male set of recorded phonemes training data (Col. 5 line 45 - Col. 6 line 67); 

receiving a female speech recognition model based on the female set of 
recorded phonemes training data (Col. 5 line 45 - Col. 6 line 67); 
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receiving a male speech recognition model based on the male set of recorded 
phonemes training data (Col. 5 line 45 - Col. 6 line 67); 

determining a difference in model information between the first speech 
recognition model and the second speech recognition model (Col. 5 line 45 - Col. 6 line 
67); 

However, Chang fails to teach phoneme training data 

creating a gender-independent speech recognition model based on the female 
set of recorded phonemes training data and the male set of recorded phonemes training 
data (Col. 6 lines 5-27) if the difference in model information is insignificant. 

Yang teaches very well known techniques of speech recognition, wherein 
difference are evaluated between all voice types, wherein Yang teaches human speech 
is generated according to a shape of vocal tract and its temporal transition. The shape 
of vocal tract, which depends on the shape or size of the vocal organ, inevitably shows 
individual differences. On the other hand, the pattern of time sequence of the vocal 
tract, which also depends on an uttered word that, shows a small individual difference. 
Therefore, features of utterance should be divided into two factors: the shape of the 
vocal tract and its temporal pattern. The former shows large difference from speaker to 
speaker whereas the latter one shows small difference. So if the difference based on 
the shape of the vocal tract is somehow normalized, the speech of specified speakers 
can be recognized using only the utterances of a small number of speakers. The 
difference in the shape of the vocal tracts causes different frequency spectra. One of 
the methods to normalize the spectral difference among speakers is to classify voice 
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input by matching it with phoneme templates which are made for unspecified speakers. 
This operation provides similarity, which does not depend very much on the differences 
among speakers. Meanwhile, the temporal pattern of vocal tract is considered to have 
small individual difference (Yang [0004]). 

Further, Yang teaches speech recognition method comprises the step of training 
a Phoneme Similarity Vector (PSV) model on the initial part to create an initial part 
model having trained initial part model parameters, the step of training a PSV on the 
final part to create a final part model having trained final part model parameter, the step 
of training a PSV on the training speech syllable to create a syllable model using the 
trained initial part parameter values and the trained final part parameter values as 
starting parameters for the syllable model, the step of operating on an object speech 
sample with the syllable model, the step of recognizing the object speech sample as an 
object speech syllable based on a degree of match of the object speech sample to the 
syllable model, and the step of representing the object speech sample as a Chinese 
character in accordance with the object speech syllable (Yang [0014]). 

Furthermore, with respect to distance comparison, Yang teaches a user creating 
a speech signal to accomplish a given task. In the second step, the spoken output is 
first recognized in that the speech signal is decoded into a series of phonemes that are 
meaningful according to the phoneme templates. The acoustic analysis portion 30 
analyses speech inputs and the extracted LPC (Linear Predictive Coding) cepstrum 
coefficients and delta power. The extracted parameters are matched with many kinds 
of phoneme templates, and static phoneme similarity and the first order regression 
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coefficients of phoneme similarity are calculated in the similarity calculation portion 40. 
After that, the time sequence of those number of phoneme templates to define a 
dimensional similarity coefficient vectors and regression coefficient vectors can be 
obtained. In the similarity calculation portion 40, mahalanobis' distance algorithm is 
employed for distance measure, where covariance matrixes for all of the phonemes are 
assumed to be the same. The meaning of the recognized words is obtained by the post 
processor that uses a dynamic programming to match inputted word with the real word 
and the word having been previously recognized by phoneme similarity calculation 
(Yang [0036]). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Chang to incorporate phoneme training 
data and creating a gender-independent speech recognition model based on the first 
set of recorded phonemes training data and the second set of recorded phonemes 
training data if the difference in model information is insignificant as taught by Yang to 
allow for the acquisition of various speech parameters from multiple speakers where 
phoneme templates are made for unspecified speakers, wherein temporal patterns and 
frequency spectra are analyzed to find the difference between speakers based on a 
vocal tract (i.e. a male and female can have different voice features) (Yang [0004]). 

Re claims 5, 1 0, and 1 5, Chang teaches method of claim 1 , wherein the female 
speech recognition model, male speech recognition model, and gender-independent 
speech recognition model (Col. 5 line 45 - Col. 6 line 67) are Gaussian mixture models . 
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However, Chang fails to teach speech recognition models that are Gaussian 
mixture models. 

Yang teaches the use of the continuous mixture Gaussian density models. With 
these methods, spectral parameters are used in speech recognition as a feature 
parameter and an enormous number of speakers are generally required for training. It 
also costs very large memory in order to get high recognition rate. If the standard 
patterns for speaker independent speech recognition can be produced from a small 
number of speakers, the size of computation will be much smaller than usual. 
Therefore, human power and computation are saved and speech recognition technique 
can be easily handled to various applications. For the purpose mentioned above, we 
proposed our invention of speech recognition apparatus using the similarity vectors as 
feature parameters. In this method, word templates trained with a small number of 
speakers yield high recognition rates in speaker-independent recognition. To realize 
the speech recognition technology in real applications, speech recognizer must be 
robust to noisy environments and spot intended words from background noise and 
unintended utterances. Furthermore, speech recognizer must retain high quality 
performance on portable devices. For these reasons, our invention was focused on 
small-size programming code but high accuracy rate for portable device which can be 
built-in a Chinese speech recognition system (Yang [0007]). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Chang to incorporate gender-independent 
speech recognition model that are Gaussian mixture models as taught by Yang to allow 
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for a less costly approach that produces higher accuracy for a speech recognition 
system, wherein recognition rates are based on speaker-independent recognition and 
modeling (Yang [0007]). 

5. Claims 2-4, 7-9, and 12-14 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Chang et al. 6567776 (hereinafter Chang) in view of Yang US 
20010010039 A1 (hereinafter Yang) and further in view of Kanevsky et al. US 
6529902 (hereinafter Kanevsky). 

Re claims 2, 7, and 12, Chang in view of Yang fails to teach the method of claim 
1 , wherein whether the model information is insignificant is based on a threshold model 
quantity. 

Kanevsky teaches the Kullback-Leibler distance between any two topics is at 
least h, where h is some sufficiently large threshold (Kanevsky Col. 5, lines 9-11). 
Further, Kanevsky teaches using Kullback-Leibler distance, one can check which pairs 
of topics are sufficiently separated from each other. Topics that are close in this metric 
could be combined together (Kanevsky Col. 12, lines 44-47). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Chang in view of Yang to incorporate the 
model information is insignificant is based on a threshold model quantity as taught by 
Kanevsky to allow for an improved language modeling for off-line automatic speech 
decoding and machine translation (Kanevsky Col. 2, lines 50-52). 
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Re claims 3, 8, and 13, Chang in view of Yang fails to teach the method of claim 
1, wherein determining the difference in model information includes calculating a 
Kullback Leibler distance between the first speech recognition model and second 
speech recognition model. 

Kanevsky et al. teaches that for two different sets, one can define a Kullback- 
Leibler distance using the frequencies of the sets. [With the distance] one can check 
which pairs of topics are sufficiently separated from each other. Topics that are close in 
this metric could be combined together (Kanevsky Col. 12, lines 42-47). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Chang in view of Yang to incorporate the 
determining the difference in model information includes calculating a Kullback Leibler 
distance between the first speech recognition model and second speech recognition 
model as taught by Kanevsky to allow for an improved language modeling for off-line 
automatic speech decoding and machine translation (Kanevsky Col. 2, lines 50-52). 

Re claims 4, 9, and 14, Chang in view of Yang fails to teach the method of claim 
3, wherein whether the model information is insignificant is based on a threshold 
Kullback Leibler distance quantity. 

Kanevsky teaches the Kullback-Leibler distance (Kanevsky Col. 5, lines 9-1 1 ) 
between any two topics is at least h, where h ~s some sufficiently large threshold, also 
they teach (Kanevsky Col. 12, lines 44-47) that while using the Kullback-Leibler 
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distance, one can check which pairs of topics are sufficiently separated from each other, 
and that topics that are close in this metric could be combined together). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Chang in view of Yang to incorporate 
whether the model information is insignificant is based on a threshold Kullback Leibler 
distance quantity as taught by Kanevsky to allow for an improved language modeling for 
off-line automatic speech decoding and machine translation, wherein a sufficiently large 
threshold indicates separate or combinational probabilities (Kanevsky Col. 2, lines 50- 
52). 



6. Claims 17-27 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Wark US 20030231775 (hereinafter Wark) in view of Chang et al. 6567776 
(hereinafter Chang) and further in view of Yang US 20010010039 A1 (hereinafter 
Yang). 

Re claims 17, 21 , and 24, Wark teaches a system for recognizing speech data 
from an audio stream originating from one of a plurality of data classes ([0094]) system 
comprising: 

a computer processor; 

a receiving module configured to receive a current feature vector of the audio 
stream ([0094]); 
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a first computing module configured to compute a current vector probability 
([006]) that the current feature vector belongs to one of the plurality of data classes 
([0094]); 

a second computing module configured to compute an accumulated confidence 
level that the audio stream belongs to one of the plurality of data classes based on the 
current vector probability ([0060]) and on previous vector probabilities ([0146] & Fig. 4, 
adjacent, previous and current segment/frame); 

a weighing module ([0142]) configured to weigh class models based on the 
accumulated confidence ([0146]); and 

a recognizing module configured to recognize the current feature vector ([0094]) 
based on the weighted class models ([0130]); and 

However, Wark in view of Chang fails to teach a plurality of data classes that 
include a first speech recognition model based on recorded phonemes originating from 
a first set of speakers, a second speech recognition model based on recorded 
phonemes from a second set of speakers, and a third speech recognition model based 
on recorded phonemes originating from both the first and second set of speakers having 
insignificant differences in information. 

Chang teaches that it is well known in related art, we learn that speaker cluster 
models have been applied to speaker-independent speech recognition and speaker 
adaptation. Although used in different application fields, the speaker cluster models are 
built in the same training phases. A training phase starts with dividing speakers into 
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different speaker clusters. Then a cluster-dependent model is independently trained for 
each speaker cluster by using the speech data of the speakers belonging to the cluster. 
The collection of all cluster-dependent models then forms a speaker cluster model. 
Most approaches in building speaker cluster models are focused on means of dividing 
speakers into clusters, especially in finding measurement of similarities across 
speakers. Some speaker clustering methods reported in articles of the related art are 
as follows: 1 . Using acoustic distances across speakers to measure similarities across 
speakers (Chang Col. 1 lines 15-49). 

Further, Chang teaches speaker based modeling representing in a tree form for 
purposes of explanation, wherein in the first level (root) of the tree we use all of the 
speech data to train a speaker-independent model. All speakers are then clustered 
according to gender. They are clustered into the male speaker cluster 102 and female 
speaker cluster 104 to train a gender-dependent model. This is the second level of the 
tree. Finally, the speakers within each gender group are clustered into two speaker 
clusters. For example, the male speaker cluster 102 is clustered into the speaker 
clusters M1112 and M2114, respectively. The female speaker cluster 104 is clustered 
into the speaker clusters F1 122 and F2124, respectively. Hence, the third level of the 
tree has four clusters. In this step, we use acoustic distances across speakers to 
measure similarities across speakers. (Chang Col. 4 line 56 - Col. 5 line 25). 

Furthermore, Chang teaches a speaker-independent model, which is built using 
maximum likelihood as the training criteria and is the first level (cluster 100) of the 
speaker cluster model, to recognize the speech signal. Its result is used for comparing 
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with the results of other experiments. Because this level only comprises one speaker 
cluster, the result is the same regardless the value of .xi.. B. Further adjust the 
parameters of the model used in experiment A (cluster 100) using the discriminative 
training method. It is shown in Table 1 that a better recognition result is achieved using 
the discriminative training method. Because the training method of the speaker cluster 
model introduced by the present invention uses the discriminative training method, the 
recognition model used for comparison is also established by using the discriminative 
training method. However, the discriminant function g.sub.i of the present invention is 
different from the discriminant function h.sub.i of the related art. C. Perform 
discriminative training on the gender dependent model (male speaker cluster 102 and 
female speaker cluster 104 ) in. the second level of the tree. Because speakers of 
different gender clusters have very different characteristics, we will not adjust 
parameters across different gender clusters. That means that the discriminative training 
performed on the parameters of the male speaker cluster 102 only uses speech data 
uttered by male speakers. The discriminative training performed on the parameters of 
the female speaker cluster 104 only uses speech data uttered by female speakers. It is 
shown in Table 1 that the recognition result using the gender-dependent model is 
superior to that using the speaker-independent model. Because the gender-dependent 
model is a simple plain-structured speaker cluster model, the speaker cluster model can 
readily manage recognition problems caused by differences between speaker 
characteristics, improving the recognition result of speaker-independent speech 
recognition (Chang Col. 5 line 45 - Col. 6 line 67). 
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Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Wark to incorporate a plurality of data 
classes that include a first speech recognition model based on recorded phonemes 
originating from a first set of speakers, a second speech recognition model based on 
recorded phonemes from a second set of speakers as taught by Chang to allow for the 
training of a speaker independent model based on gender dependent models, wherein 
recognition results are improved where problems due to differences in speaker 
characteristics are minimized to enhance modeling and training (Chang Col. 5 line 45 - 
Col. 6 line 67). 

However, Wark in view of Chang fails to teach phoneme training data 
creating a gender-independent speech recognition model based on the first set of 
recorded phonemes training data and the second set of recorded phonemes training 
data if the difference in model information is insignificant. 

Yang teaches very well known techniques of speech recognition, wherein 
difference are evaluated between all voice types, wherein Yang teaches human speech 
is generated according to a shape of vocal tract and its temporal transition. The shape 
of vocal tract, which depends on the shape or size of the vocal organ, inevitably shows 
individual differences. On the other hand, the pattern of time sequence of the vocal 
tract, which also depends on an uttered word that, shows a small individual difference. 
Therefore, features of utterance should be divided into two factors: the shape of the 
vocal tract and its temporal pattern. The former shows large difference from speaker to 
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speaker whereas the latter one shows small difference. So if the difference based on 
the shape of the vocal tract is somehow normalized, the speech of specified speakers 
can be recognized using only the utterances of a small number of speakers. The 
difference in the shape of the vocal tracts causes different frequency spectra. One of 
the methods to normalize the spectral difference among speakers is to classify voice 
input by matching it with phoneme templates which are made for unspecified speakers. 
This operation provides similarity, which does not depend very much on the differences 
among speakers. Meanwhile, the temporal pattern of vocal tract is considered to have 
small individual difference (Yang [0004]). 

Further, Yang teaches speech recognition method comprises the step of training 
a Phoneme Similarity Vector (PSV) model on the initial part to create an initial part 
model having trained initial part model parameters, the step of training a PSV on the 
final part to create a final part model having trained final part model parameter, the step 
of training a PSV on the training speech syllable to create a syllable model using the 
trained initial part parameter values and the trained final part parameter values as 
starting parameters for the syllable model, the step of operating on an object speech 
sample with the syllable model, the step of recognizing the object speech sample as an 
object speech syllable based on a degree of match of the object speech sample to the 
syllable model, and the step of representing the object speech sample as a Chinese 
character in accordance with the object speech syllable (Yang [0014]). 

Furthermore, with respect to distance comparison, Yang teaches a user creating 
a speech signal to accomplish a given task. In the second step, the spoken output is 
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first recognized in that the speech signal is decoded into a series of phonemes that are 
meaningful according to the phoneme templates. The acoustic analysis portion 30 
analyses speech inputs and the extracted LPC (Linear Predictive Coding) cepstrum 
coefficients and delta power. The extracted parameters are matched with many kinds 
of phoneme templates, and static phoneme similarity and the first order regression 
coefficients of phoneme similarity are calculated in the similarity calculation portion 40. 
After that, the time sequence of those number of phoneme templates to define a 
dimensional similarity coefficient vectors and regression coefficient vectors can be 
obtained. In the similarity calculation portion 40, mahalanobis' distance algorithm is 
employed for distance measure, where covariance matrixes for all of the phonemes are 
assumed to be the same. The meaning of the recognized words is obtained by the post 
processor that uses a dynamic programming to match inputted word with the real word 
and the word having been previously recognized by phoneme similarity calculation 
(Yang [0036]). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Chang to incorporate phoneme training 
data and creating a gender-independent speech recognition model based on the first 
set of recorded phonemes training data and the second set of recorded phonemes 
training data if the difference in model information is insignificant as taught by Yang to 
allow for the acquisition of various speech parameters from multiple speakers where 
phoneme templates are made for unspecified speakers, wherein temporal patterns and 
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frequency spectra are analyzed to find the difference between speakers based on a 
vocal tract (i.e. a male and female can have different voice features) (Yang [0004]). 

Re claims 18, 22, and 25, method of claim 17, wherein computing the current 
vector probability ([0060]) includes estimating a posteriori class probability for the 
current feature vector ([0146] & Fig. 4, adjacent, previous and current segment/frame). 

Re claims 19, 23, and 26, method of claim 17, wherein computing the 
accumulated confidence level further comprising weighing the current vector ([0094]) 
probability ([0060]) more than the previous vector probabilities ([0146] & Fig. 4, 
adjacent, previous and current segment/frame). 

Re claims 20 and 27, method of claim 17, further comprising determining if 
another feature vector is available for analysis ([0094]). 

Conclusion 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Michael C. Colucci whose telephone number is (571)- 
270-1847. The examiner can normally be reached on 9:30 am - 6:00 pm, Monday- 
Friday. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil can be reached on (571)-272-7602. The fax phone 
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number for the organization where this application or proceeding is assigned is 571- 
273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
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